-
Notifications
You must be signed in to change notification settings - Fork 4.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
UnixFileSystem: read cached hashes from extended attributes #11662
UnixFileSystem: read cached hashes from extended attributes #11662
Conversation
Using user-settable attributes is not safe, and I would be against having this enabled by default because it's footgun territory. I'm not even sure this should be merged at all. Filesystem implementations that expose hashes through attributes, e.g., FUSE-based filesystems, can use a non-user-settable namespace. I'd prefer we aligned with Google on the name, but they may be using a 'google.' prefix, which would be undesirable (and it's not the end of the world if it differs, except that it has a silent failure mode if the attribute name doesn't match exactly). |
Aligning with Google means that they can use bazel unmodified with their existing filesystem implementations. |
Could you explain why this would be unsafe? My takeaway has been that getting/setting This means that if using
This is also fine by me. Any convention will do. Even |
Bazel would rely on an external contract for correctness: "the user has to make sure that anything modifying any file in the source tree will either update or remove the extended attribute". In most situations, there is nothing enforcing this contract, so it's clearly not safe for Bazel to rely on it, especially in the default configuration. Yes, there are cases where you can provide such guarantees, e.g., through a FUSE filesystem, or by controlling the writer and making the filesystem read-only. Since you're requiring significant additional infrastructure anyway, why would you then prefer user-settable attributes over non-user-settable attributes? In a FUSE filesystem, you can actually handle read/write access correctly, so you definitely don't want users to set them, so why use user-settable attributes? If you're going the other route, why not allow the writer to write non-user-settable attributes? |
Note that you only require this 'significant additional infrastructure' in case you require that this is not being tampered with. Even though Bazel has 'correctness' in its slogan, there is always a certain ceiling/limit where correctness breaks down, and that's fine. That's where the authors of a tool such as Bazel come up with a boundary. Here are some examples:
With that same reasoning, you could argue that having a read-only file that has user.checksum.sha256 set, but making it writable again and modifying its contents without adjusting the attribute as well is also invalid.
As the name implies: it limits this functionality to super users (or FUSE file systems, or ...). Why would I need administrative rights to create a data set on my system and annotate it with some checksums, just so I can instruct Bazel to use them? Should I file tickets against my sysadmin to get those added? Again, I have no strong opinion on it. As long as at the end of the day, we get a feature like this added. Any concrete proposals for a naming scheme other than what is suggested thus far? |
On Mon, Jun 29, 2020 at 4:59 PM Ed Schouten ***@***.***> wrote:
Since you're requiring significant additional infrastructure anyway, [...]
Note that you only require this 'significant additional infrastructure' in
case you require that this is not being tampered with. Even though Bazel
has 'correctness' in its slogan, there is always a certain ceiling/limit
where correctness breaks down, and that's fine. That's where the authors of
a tool such as Bazel come up with a boundary. Here are some examples:
- Do we expect that Bazel remains 'correct' in case of hardware memory
corruption? Of course not.
- Do we expect that build actions are still hermetic in case they do
symlink expansion, thereby finding the actual location of the files, and
start sniffing around there? Nope!
Unfortunately, we were unable to lock down the Linux sandbox to avoid
symlinks - in this specific case, a tradeoff between performance and
safety. If there's a Linux kernel that supports faster bind mounts, this
would probably change. MacOS simply doesn't provide the tools to enforce
this. In either case, this is definitely on the side of "we'd like to
enforce it if we could without making action execution super-slow".
(We were originally using Linux bind mounts until we discovered that their
performance is O(n^2) with the total number of bind mounts in the system.)
- Do we expect that Bazel remains 'correct' if I run a program in the
background that keeps on resetting file timestamps using utimensat(),
thereby making it harder for build systems to detect changes? Likely not.
Bazel uses ctime rather than mtime, which can't be modified by normal
users (AFAIK). (Except on Windows.)
With that same reasoning, you could argue that having a read-only file
that has user.checksum.sha256 set, but making it writable again and
modifying its contents without adjusting the attribute as well is also
invalid.
This is very different from hardware memory corruption. It provides users
with a clear and direct mechanism for breaking correctness. Just run
setxattr in your workspace and Bazel is no longer correct *for every
subsequent build* until the attribute is fixed or removed. I think this is
clearly beyond the line that we'd like to draw.
[...], why would you then prefer user-settable attributes over
non-user-settable attributes?
As the name implies: it limits this functionality to super users (or FUSE
file systems, or ...). Why would I need administrative rights to create a
data set on my system and annotate it with some checksums, just so I can
instruct Bazel to use them? Should I file tickets against my sysadmin to
get those added?
You can always run a modified Bazel binary if you feel like it.
… Again, I have no strong opinion on it. As long as at the end of the day,
we get a feature like this added. Any concrete proposals for a naming
scheme other than what is suggested thus far?
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#11662 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABYD2YLQ7UPTRZZXIQ5LWRTRZCT6DANCNFSM4OLEEFKQ>
.
|
|
As I said above, I'd prefer to sync with Google. Not sure who, though. @jmmv @EricBurnett |
Drive-by comment to say that I am also uncomfortable with user-modifiable xattrs being used for this, but would love the ability to opt-in to something a FUSE filesystem can set. |
On macOS, attribute names are completely freeform. Even as a non-root user, I can do this:
On Linux, it's a different story. Attribute names have to be placed in namespaces. Only four namespaces are supported: security, system, trusted and user. At least on my system (stock Ubuntu 18.04 with ext4), permissions for those namespaces seem to be as follows:
This means that the properties as proposed by @ulfjack are seemingly impossible to achieve on macOS. On Linux, we can get that by calling the attributes EDIT: The Linux
This means that even access to Maybe the best option would be to stick to EDIT 2: If we're going to use a command line flag anyway, maybe we can avoid the entire naming discussion by letting that command line flag take the name of the extended attribute to use... Hmm.... |
baf2383
to
27847a4
Compare
This is also discussed in https://groups.google.com/g/bazel-dev/c/SH-Dnm5e0TE/m/uRUSD9UrBgAJ. FWIW, from my perspective this is a useful addition, I actually assumed it was already in given the thread above :). |
Looking at the patch as presented, it only reads attributes and does not write them. This seems pretty reasonable to me, regardless of namespace used: when directed to do so, bazel is trusting an external (unspecified) system to update extended attributes on files to match the hashes of their contents. Most likely to be used with FUSE filesystems and similar, which should be good for maintaining correctness in the event that inputs change, but if anything else is used to set these attributes, that system must also maintain the invariant that they do not get stale (however it chooses to achieve that). Yes, the user could shoot themselves in the foot on this, but that seems similar to me in how a user could use a bad filesystem that provides insufficiently atomic operations, or run some tool to fetch inputs that failed to update them in cases. Their ability to get themselves into a state where the extended attributes are wrong is similar in flavour to their ability to get themselves into a state where the input file bytes are wrong. And in particular, use of a protected namespace doesn't actually defend against it - just because it reduces potential writers of the attributes in question doesn't mean it prevents them from going stale any better? I think Ulf's line of argument is very valid in the event that bazel writes attributes, I'll note - bazel should not write and re-read attributes without strong expectation it can keep them consistent in the event of file changes, and may want to take precautions to isolate them further from outside corruption via namespaces. Couple comments on the patch though (as a drive-by non-expert on bazel code):
|
27847a4
to
1adc7f9
Compare
Hi Eric, Thanks for the review!
I don't think Linux/UNIX provides any facilities to stat() and getxattr() a file at once. It'll need to remain two separate system calls regardless of whether they are grouped.
I think that caching is done at a higher level. UnixFileSystem.getDigest() normally forwards the call super.super.super.[....].getDigest(), which eventually ends up at FileSystem.getDigest(). That function does the heavy work of checksumming the data that's on disk: protected byte[] getDigest(final Path path) throws IOException {
return new ByteSource() {
@Override
public InputStream openStream() throws IOException {
return getInputStream(path);
}
}.hash(digestFunction.getHashFunction()).asBytes();
} So my assumption is that this change does not disable/bypass caching.
Keys of extended attributes are null-terminated strings, while their values are supposedly binary safe. Tools like xattr(1) are also capable of displaying binary values well, which is why I decided to not do any hexadecimal -> binary conversion.
Yeah, that's a fair point, and also where I'd like your and the other Bazel folks' input. My mindset going into this was to put this logic in a decorator class that wraps an existing FileSystem, while overriding getDigest(). It looks like this isn't straight-forward, because FileSystem is not an interface. Coming from Go, Java also doesn't seem to have any facilities for easily forwarding methods, like Go's way of doing composition. That means we'll need to forward all the other methods manually. Letting this be part of UnixFileSystem makes it harder to unit test, because it directly calls into the system calls (through JNI, I suppose). What would be the best way for me to test this? |
Thanks for the quick answers Ed! Those all make sense to me. I'll defer to the bazel folk on the thread for the question of how to test though, as I don't know best practices here myself. (I mostly interact with bazel as a client tool that I look into for debugging, but don't really know how to update myself). In the abstract, a wrapper around getxattr would at least let you test against the interface you believe it holds, but probably you'll also need an integration test to test the round-trip against a real filesystem? |
help = | ||
"The name of an extended attribute that can be placed on files to store a precomputed " | ||
+ "copy of the file's hash, corresponding with --digest_function. This option " | ||
+ "can be used to reduce disk I/O and CPU load caused by hash computation.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this needs to be clearer (and probably longer).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ulfjack any suggestions on how to improve the description?
src/main/java/com/google/devtools/build/lib/unix/UnixFileSystem.java
Outdated
Show resolved
Hide resolved
@janakdr thoughts? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that having the xattr name as a startup option is reasonable. Inside Google, we'll need to force-set this option to the Google-internal xattr, but I don't think that should block this. If there's enough documentation/stern warnings around this, letting people specify the xattr themselves seems ok. Could potentially warn if the xattr has an "insecure shape", with another option to turn off that warning, although that may be overkill.
In terms of tests, you could write a Java integration test with a custom FileSystem that recorded calls, and assert that the desired ones were made. One example with a custom filesystem. Could also write a shell test that turned on profiling
add_to_blazerc "common --record_full_profiler_data --profile=$PROF"
add_to_blazerc "common --experimental_generate_json_trace_profile"
add_to_blazerc "common --noslim_profile"
and then counted the number of system call operations for each file. We have an internal test that does that.
src/main/java/com/google/devtools/build/lib/unix/UnixFileSystem.java
Outdated
Show resolved
Hide resolved
46dd576
to
531c3a3
Compare
Thanks for that link! One of the things I completely forgot is that it's valid in Java to inherit from a class and override one of the methods deep down. Calls to sibling methods in a base class also do a 'vtable lookup' (or however Java would call that). You tend to forget that as a Go programmer... Anyway, I've added a basic unit test that inherits from UnixFileSystem and overrides getxattr() to return a constant value. Calls to DigestUtils.getDigestWithManualFallback() should then succeed without doing any file I/O. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you comment on which Bazel "roots" you expect to have this xattr? Specifically, will this xattr be available for source files, output files, or both? If just one, then this will wastefully try to compute fast digests for every file under the other root. This is related to your concern about not using the file cache when using fast digests.
Inside Google, the same xattr provides a digest for both source and outputs. Unfortunately, at the FileSystem level, the distinction is lost, so I'm not sure how you could switch off between them here if only one has fast digests.
src/main/java/com/google/devtools/build/lib/unix/UnixFileSystem.java
Outdated
Show resolved
Hide resolved
src/test/java/com/google/devtools/build/lib/unix/UnixDigestHashAttributeNameTest.java
Show resolved
Hide resolved
531c3a3
to
6e074e4
Compare
Hi Janak!
Initially I'm only interested in using this on output files (#11703). In the long run I also want to use it for source files. So for me completely acceptable for me to call getxattr() globally. |
6e074e4
to
1034111
Compare
Drive by comment: might want to check out Facebooks piper/srcfs clone. They have xattrs implemented for buck https://github.com/facebookexperimental/eden/ |
Sorry, I was OOO all last week. I'll try to take a look this week, but it may be a few days as I dig out from my backlog. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's probably good to document explicitly that setting this flag means that Bazel will try to get this xattr for every file, including both source and outputs, so that users aren't surprised if Bazel starts issuing more filesystem calls with this flag.
3cbdcf8
to
fab90a6
Compare
Done! |
In this PR for Bazel I am adding support for reading hashes from extended attributes: bazelbuild/bazel#11662 This can speed up Bazel's file system access, as it no longer needs to read the full file contents to compute the digest. The PR for Bazel does not set any standard for extended attribute naming. Let's go ahead and pick a naming scheme on our end. We could have also used Buildbox's "user.checksum.*", but the format of those extended attributes is different. Bazel expects binary hashes, while Buildbox uses base16 (hexadecimal) encoding.
bb83657
to
eb855f8
Compare
eb855f8
to
ce61a16
Compare
There are certain workloads where Bazel's running time gets dominated by checksum computation. Examples include: - People adding local_repository()s to their project that point to networked file shares. - The use of repositories that contain very large input files. When using remote execution, we need to compute digests to be able to place such files in input roots. In many cases, a centralized CAS will already contain these files. It would be nice if Bazel could efficiently check for existence of such objects without needing to scan the file locally. This change extends UnixFileSystem to call getxattr() on an attribute prior to falling back to reading file contents. The name of the extended attribute that is used is configurable through a command line flag. Using extended attributes to store this information also seems to be a fairly common approach. Apparently it is also used within Google itself: https://groups.google.com/g/bazel-discuss/c/6VmjSOLySnY/m/v2dpwt8jBgAJ So far no code has been added to let Bazel write these attributes to disk. The main goal so far is to speed up access to read-only corpora, where the maintainers have spent the effort adding these attributes.
ce61a16
to
3fa149a
Compare
Hey! I've just rebased the change on top of latest master once again. Is there anything else I need to do on my end? |
Whoops, my apologies, forgot that I needed to manually import! I've imported the pending change and will work on harmonizing this change with Google's internal hash setup. I'll update here as I go. |
Update: I've gotten rid of the gross global digest function state that we were using I currently have a bunch of commits in flight to stop using Path#getDigest (using DigestUtils instead, or getting metadata from elsewhere). Once that's done, I'll get rid of the FileSystem#getDigest overrides, like RemoteActionFileSystem (we have more internally). However, I think that this change is somewhat orthogonal to that clean-up, so I will continue to work on getting it into shape in parallel. |
…with DigestUtils#getDigestWithManualFallback or document why Path#getFastDigest is not useful to them. This is inspired by cleaning up in preparation for #11662: currently, some FileSystem implementations call Path#getFastDigest inside Path#getDigest because they have a fast digest implementation and want to guard against expensive digest computations. After this change, ~all overrides of FileSystem#getDigest can be removed, since the callers will already have called FileSystem#getFastDigest. PiperOrigin-RevId: 331142210
Awesome work, @janakdr! Would the changes you're working on also allow us to implement this feature using a plain option, instead of a startup option? Or would that still be a bit of a stretch? |
We never really recreate the FileSystem on a running server, so it might be tough even after these changes. We could discuss how to make that happen, and what the priority should be, though. |
Awesome! \o/ |
There are certain workloads where Bazel's running time gets dominated by checksum computation. Examples include: - People adding local_repository()s to their project that point to networked file shares. - The use of repositories that contain very large input files. When using remote execution, we need to compute digests to be able to place such files in input roots. In many cases, a centralized CAS will already contain these files. It would be nice if Bazel could efficiently check for existence of such objects without needing to scan the file locally. This change extends UnixFileSystem to call getxattr() on attribute ~~"user.checksum.${algo}" prior to falling back to reading file contents.~~ ~~There is no true standard on how these extended attributes should be~~ ~~called, but "user.checksum.${algo}" already has some precedent. It is,~~ ~~for example, used by BuildGrid internally:~~ ~~https://gitlab.com/BuildGrid/buildbox/buildbox-fuse/-/merge_requests/9~~ **EDIT:** The name of the extended attribute is now configurable. Using extended attributes to store this information also seems to be a fairly common approach. Apparently it is also used within Google itself: https://groups.google.com/g/bazel-discuss/c/6VmjSOLySnY/m/v2dpwt8jBgAJ So far no code has been added to let Bazel write these attributes to disk. The main goal so far is to speed up access to read-only corpora, where the maintainers have spent the effort adding these attributes. Closes bazelbuild#11662. (@janakdr made some modifications from the original pull request, mainly to deal with merge conflicts and address Google-internal style.) PiperOrigin-RevId: 332256967
There are certain workloads where Bazel's running time gets dominated by
checksum computation. Examples include:
networked file shares.
When using remote execution, we need to compute digests to be able to
place such files in input roots. In many cases, a centralized CAS will
already contain these files. It would be nice if Bazel could efficiently
check for existence of such objects without needing to scan the file
locally.
This change extends UnixFileSystem to call getxattr() on attribute
"user.checksum.${algo}" prior to falling back to reading file contents.There is no true standard on how these extended attributes should becalled, but "user.checksum.${algo}" already has some precedent. It is,for example, used by BuildGrid internally:https://gitlab.com/BuildGrid/buildbox/buildbox-fuse/-/merge_requests/9EDIT: The name of the extended attribute is now configurable.
Using extended attributes to store this information also seems to be a
fairly common approach. Apparently it is also used within Google itself:
https://groups.google.com/g/bazel-discuss/c/6VmjSOLySnY/m/v2dpwt8jBgAJ
So far no code has been added to let Bazel write these attributes to
disk. The main goal so far is to speed up access to read-only corpora,
where the maintainers have spent the effort adding these attributes.